Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps
نویسندگان
چکیده
This report introduces hybrid implementation of the Gromacs application, and provides instructions on building and executing on PRACE prototype platforms with Grahpical Processing Units (GPU) and Many Intergrated Cores (MIC) accelerator technologies. GROMACS currently employs message-passing MPI parallelism, multi-threading using OpenMP and contains kernels for non-bonded interactions that are accelerated using the CUDA programming language. As a result, the execution model is multi-faceted where end users can tune the application execution according to the underlying platforms. We present results that have been collected on the PRACE prototype systems as well as on other GPU and MIC accelerated platforms with similar configurations. We also report on the preliminary porting effort that involves a fully portable implementation of GROMACS using OpenCL programming language instead of CUDA, which is only available on NVIDIA GPU devices.
منابع مشابه
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe OpenMP Parallelization of the Slilab Code
This white paper describes parallelization of the Slilab code with OpenMP for a shared-memory execution model when focusing on the multiphase phase flow simulations, such as fiber suspensions in turbulent channel flows. In such problems the motion of the ”second phase fibre” is frequently crossed over the distributed domain boundary of the ”first phase fluid”, which in turn reduces the work-bal...
متن کاملAvailable online at www.prace-ri.eu Partnership for Advanced Computing in Europe Power instrumentation of task-based applications using model- specific registers on the Sandy Bridge architecture
This whitepaper describes the technical side of a research work into the energy-efficiency tradeoffs of task-based execution with vectorization, through the application of recently available model-specific registers for counting energy use. It describes the mechanisms used to extract energy figures with respect to architectural and operating system concerns, and illustrates their utility in the...
متن کاملAn OpenMP Programming Toolkit for Hybrid CPU/GPU Clusters Based on Software Unified Memory
Recently, hybrid CPU/GPU cluster has drawn much attention from the researchers of high performance computing because of amazing energy efficiency and adaptable resource exploitation. However, the programming of hybrid CPU/GPU clusters is very complex because it requires users to learn new programming interfaces such as CUDA and OpenCL, and combine them with MPI and OpenMP. To address this probl...
متن کاملAvailable online at www.prace-ri.eu Partnership for Advanced Computing in Europe MapReduce-based Parallelization of Sparse Matrix Kernels for Large-scale Scientific Applications
This whitepaper addresses applicability of the MapReduce paradigm for scientific computing by realizing it on the widely used sparse matrix-vector multiplication (SpMV) operation with a recent library developed for this purpose. Scaling SpMV operations proves vital as it is a kernel that finds its applications in many scientific problems from different domains. Generally, the scalability improv...
متن کاملPartnership for Advanced Computing in Europe Accelerator Aware MPI Micro - benchmarking using CUDA , OpenACC and OpenCL
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrated Core (MIC) and nVidia GPU. This has been accomplished by changes to different levels of the software stacks and MPI implementations. In order to evaluate performance and scalability of accelerator aware MPI libraries, we developed portable micro-benchmarks to indentify factors that influence ef...
متن کامل